Performance of KDB-Trees with Query-Based Splitting

نویسندگان

Yves Lépouchard

John L. Pfaltz

Ratko Orlandic

چکیده

While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multidimensional access methods designed for highdimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the “importance” of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a “cyclic” fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Filtering Algorithm for RFID Middleware Using KDB-tree Query Index

RFID middleware collects and filters RFID streaming data gathered continuously by numerous readers to process requests from applications. These requests are called continuous queries. The problem when using any of the existing query indexes on these continuous queries is that it takes a long time to build the index because it is necessary to insert a large number of segments into the index. KDB...

متن کامل

A Fast Algorithm for high-dimensional Similarity Joins

Many emerging data mining applications require a similarity join between points in a highdimensional domain. We present a new algorithm that utilizes a new index structure, called the -kdB tree, for fast spatial similarity joins on high-dimensional points. This index structure reduces the number of neighboring leaf nodes that are considered for the join test, as well as the traversal cost of nd...

متن کامل

High Dimensional Feature Indexing Using Hybrid Trees

Feature based similarity search is emerging as an important search paradigm in database systems. The technique used is to map the data items as points into a high dimensional feature space which is indexed using a multidimensional data structure. Similarity search then corresponds to a range search over the data structure. Traditional multidimensional data structures (e.g., R-tree, KDB-tree, gr...

متن کامل

Linear R-tree Revisited

The problem of finding an optimal splitting of overflowed nodes has a major influence on query performance of the R-tree spatial index structure. Most of the previous split heuristics of R-tree-based index structures have quadratic time and face the problem of increasing overlap of the resulting minimum bounding rectangles (MBRs). In this paper, we propose an efficient heuristic method for hand...

متن کامل